A simple RNN-plus-highway network for statistical parametric speech synthesis

نویسندگان

  • Xin Wang
  • Shinji Takaki
  • Junichi Yamagishi
چکیده

In this report, we proposes a neural network structure that combines a recurrent neural network (RNN) and a deep highway network. Compared with the highway RNN structures proposed in other studies, the one proposed in this study is simpler since it only concatenates a highway network after a pre-trained RNN. The main idea is to use the ‘iterative unrolled estimation’ of a highway network to finely change the output from the RNN. The experiments on the proposed network structure with a baseline RNN and 7 highway blocks demonstrated that this network performed relatively better than a deep RNN network with a similar mode size. Furthermore, it took less than half the training time of the deep RNN.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An investigation of recurrent neural network architectures for statistical parametric speech synthesis

In this paper, we investigate two different recurrent neural network (RNN) architectures: Elman RNN and recently proposed clockwork RNN [1] for statistical parametric speech synthesis (SPSS). Of late, deep neural networks are being used for SPSS which involve predicting every frame independent of the previous predictions, and hence requires post-processing for ensuring smooth evolution of speec...

متن کامل

Acoustic Modeling in Statistical Parametric Speech Synthesis – from Hmm to Lstm-rnn

Statistical parametric speech synthesis (SPSS) combines an acoustic model and a vocoder to render speech given a text. Typically decision tree-clustered context-dependent hidden Markov models (HMMs) are employed as the acoustic model, which represent a relationship between linguistic and acoustic features. Recently, artificial neural network-based acoustic models, such as deep neural networks, ...

متن کامل

Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices

Acoustic models based on long short-term memory recurrent neural networks (LSTM-RNNs) were applied to statistical parametric speech synthesis (SPSS) and showed significant improvements in naturalness and latency over those based on hidden Markov models (HMMs). This paper describes further optimizations of LSTM-RNN-based SPSS for deployment on mobile devices; weight quantization, multi-frame inf...

متن کامل

Multi-Language Multi-Speaker Acoustic Modeling for LSTM-RNN Based Statistical Parametric Speech Synthesis

Building text-to-speech (TTS) systems requires large amounts of high quality speech recordings and annotations, which is a challenge to collect especially considering the variation in spoken languages around the world. Acoustic modeling techniques that could utilize inhomogeneous data are hence important as they allow us to pool more data for training. This paper presents a long short-term memo...

متن کامل

Contextual Representation using Recurrent Neural Network Hidden State for Statistical Parametric Speech Synthesis

In this paper, we propose to use hidden state vector obtained from recurrent neural network (RNN) as a context vector representation for deep neural network (DNN) based statistical parametric speech synthesis. While in a typical DNN based system, there is a hierarchy of text features from phone level to utterance level, they are usually in 1-hot-k encoded representation. Our hypothesis is that,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017